A Multistrategy Data Mining Approach to Classification
نویسندگان
چکیده
Our research explores the use of ensemble, or multistrategy learning techniques for inducing and managing patterns of knowledge from organizational data. Specifically, we are exploring the use of data mining techniques in building an ensemble classification system – i.e., a system that incorporates multiple machine learning techniques to generate multiple models from existing data and make predictions about new observations. Our research is inspired and motivated by a real-world business problem. The emergence of the digital personal video recorder (PVR) is expected, over time, to cause profound changes in television viewing, as viewers use the new technology to time-shift viewing and skim over or eliminate ‘in stream’ commercials. This trend is a significant threat to television advertisers and service providers, because it jeopardizes the traditional means by which advertising finances so-called ‘free’ programming. Although a number of modeling methods are potentially useful for the analysis of television viewing data and the classification of specific viewer types, because of the complexity of the domain we cannot know a priori which methods will be most accurate in specific situations. The effectiveness of a particular method is dependent on a number of factors, including the characteristics of the viewer, the prevalence of target viewers in the overall population, the specific viewer attributes to be predicted, asymmetry of misclassification costs, and other characteristics of the viewing data – including types of programs viewed, time of day, and so on. Because it is unlikely that any single method could perform optimally under these circumstances, we are developing an ensemble classifier composed of a number of different analytic methods. This classifier would process various television viewing data sets against each of the methods, and attempt to construct a single prediction about the viewer from the collective predictions of the various methods. We have conducted preliminary analyses of viewer data obtained from Nielsen Media Services, Inc. (NMSI), and developed an initial prototype of the data mining component from those analyses. Our initial study of viewing behavior for five target gender/age segments suggests that gains in performance are possible even with simple democratic voting schemes – i.e., where each method has a single vote. Our goal now is to determine whether we can do better by using more sophisticated combination strategies. We intend to approach the problem in two phases. The first phase will explore the combination of multiple methods in a controlled experiment using simulated data, while the second will apply lessons learned from the controlled experiment to the analysis of actual television viewing data obtained from NMSI.
منابع مشابه
A Methodology and Life Cycle Model for Data Mining and Knowledge Discovery in Precision Agriculture
This paper presents a methodology for data mining and knowledge discovery in large, distributed and heterogeneous databases. In order to obtain potentially interesting patterns, relationships, and rules from such large and heterogeneous data collections, it is essential that a methodology be developed to take advantage of the suite of existing methods and tools available for data mining and kno...
متن کاملCustomer Retention Based on the Number of Purchase: A Data Mining Approach
Purpose: this study wants to find any relationship between the numbers of purchase and the income the customer brings to the company. The attempt is to find those customers who buy more than one life insurance policy and represent the signs of good payments at the same time by the help of data mining tools. Design/ methodology/ approach: the approach of this research is to use data mining tools...
متن کاملUsing Combined Descriptive and Predictive Methods of Data Mining for Coronary Artery Disease Prediction: a Case Study Approach
Heart disease is one of the major causes of morbidity in the world. Currently, large proportions of healthcare data are not processed properly, thus, failing to be effectively used for decision making purposes. The risk of heart disease may be predicted via investigation of heart disease risk factors coupled with data mining knowledge. This paper presents a model developed using combined descri...
متن کاملAn Integrated DEA and Data Mining Approach for Performance Assessment
This paper presents a data envelopment analysis (DEA) model combined with Bootstrapping to assess performance of one of the Data mining Algorithms. We applied a two-step process for performance productivity analysis of insurance branches within a case study. First, using a DEA model, the study analyzes the productivity of eighteen decision-making units (DMUs). Using a Malmquist index, DEA deter...
متن کاملAqBC: A Multistrategy Approach for Constructive Induction
In order to obtain potentially interesting patterns and relations from large, distributed, heterogeneous databases, it is essential to employ an intelligent and automated KDD (Knowledge Discovery in Databases) process. One of the most important methodologies is an integration of diverse learning strategies that cooperatively performs a variety of techniques and achieves high quality knowledge. ...
متن کاملCredit scoring in banks and financial institutions via data mining techniques: A literature review
This paper presents a comprehensive review of the works done, during the 2000–2012, in the application of data mining techniques in Credit scoring. Yet there isn’t any literature in the field of data mining applications in credit scoring. Using a novel research approach, this paper investigates academic and systematic literature review and includes all of the journals in the Science direct onli...
متن کامل